Importing All Required Libraries

In [99]:

Load the dataset

In [100]:
Out[100]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio class
0 95 48.0 83.0 178.0 72.0 10 162.0 42.0 20.0 159 176.0 379.0 184.0 70.0 6.0 16.0 187.0 197 van
1 91 41.0 84.0 141.0 57.0 9 149.0 45.0 19.0 143 170.0 330.0 158.0 72.0 9.0 14.0 189.0 199 van
2 104 50.0 106.0 209.0 66.0 10 207.0 32.0 23.0 158 223.0 635.0 220.0 73.0 14.0 9.0 188.0 196 car
3 93 41.0 82.0 159.0 63.0 9 144.0 46.0 19.0 143 160.0 309.0 127.0 63.0 6.0 10.0 199.0 207 van
4 85 44.0 70.0 205.0 103.0 52 149.0 45.0 19.0 144 241.0 325.0 188.0 127.0 9.0 11.0 180.0 183 bus
5 107 NaN 106.0 172.0 50.0 6 255.0 26.0 28.0 169 280.0 957.0 264.0 85.0 5.0 9.0 181.0 183 bus
6 97 43.0 73.0 173.0 65.0 6 153.0 42.0 19.0 143 176.0 361.0 172.0 66.0 13.0 1.0 200.0 204 bus
7 90 43.0 66.0 157.0 65.0 9 137.0 48.0 18.0 146 162.0 281.0 164.0 67.0 3.0 3.0 193.0 202 van
8 86 34.0 62.0 140.0 61.0 7 122.0 54.0 17.0 127 141.0 223.0 112.0 64.0 2.0 14.0 200.0 208 van
9 93 44.0 98.0 NaN 62.0 11 183.0 36.0 22.0 146 202.0 505.0 152.0 64.0 4.0 14.0 195.0 204 car
In [3]:
Out[3]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio class
836 87 45.0 66.0 139.0 58.0 8 140.0 47.0 18.0 148 168.0 294.0 175.0 73.0 3.0 12.0 188.0 196 van
837 94 46.0 77.0 169.0 60.0 8 158.0 42.0 20.0 148 181.0 373.0 181.0 67.0 12.0 2.0 193.0 199 car
838 95 43.0 76.0 142.0 57.0 10 151.0 44.0 19.0 149 173.0 339.0 159.0 71.0 2.0 23.0 187.0 200 van
839 90 44.0 72.0 157.0 64.0 8 137.0 48.0 18.0 144 159.0 283.0 171.0 65.0 9.0 4.0 196.0 203 van
840 93 34.0 66.0 140.0 56.0 7 130.0 51.0 18.0 120 151.0 251.0 114.0 62.0 5.0 29.0 201.0 207 car
841 93 39.0 87.0 183.0 64.0 8 169.0 40.0 20.0 134 200.0 422.0 149.0 72.0 7.0 25.0 188.0 195 car
842 89 46.0 84.0 163.0 66.0 11 159.0 43.0 20.0 159 173.0 368.0 176.0 72.0 1.0 20.0 186.0 197 van
843 106 54.0 101.0 222.0 67.0 12 222.0 30.0 25.0 173 228.0 721.0 200.0 70.0 3.0 4.0 187.0 201 car
844 86 36.0 78.0 146.0 58.0 7 135.0 50.0 18.0 124 155.0 270.0 148.0 66.0 0.0 25.0 190.0 195 car
845 85 36.0 66.0 123.0 55.0 5 120.0 56.0 17.0 128 140.0 212.0 131.0 73.0 1.0 18.0 186.0 190 van
In [31]:
Index(['compactness', 'circularity', 'distance_circularity', 'radius_ratio',
       'pr.axis_aspect_ratio', 'max.length_aspect_ratio', 'scatter_ratio',
       'elongatedness', 'pr.axis_rectangularity', 'max.length_rectangularity',
       'scaled_variance', 'scaled_variance.1', 'scaled_radius_of_gyration',
       'scaled_radius_of_gyration.1', 'skewness_about', 'skewness_about.1',
       'skewness_about.2', 'hollows_ratio', 'class'],
      dtype='object')
(846, 19)

Exploratory data quality report

In [4]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 846 entries, 0 to 845
Data columns (total 19 columns):
compactness                    846 non-null int64
circularity                    841 non-null float64
distance_circularity           842 non-null float64
radius_ratio                   840 non-null float64
pr.axis_aspect_ratio           844 non-null float64
max.length_aspect_ratio        846 non-null int64
scatter_ratio                  845 non-null float64
elongatedness                  845 non-null float64
pr.axis_rectangularity         843 non-null float64
max.length_rectangularity      846 non-null int64
scaled_variance                843 non-null float64
scaled_variance.1              844 non-null float64
scaled_radius_of_gyration      844 non-null float64
scaled_radius_of_gyration.1    842 non-null float64
skewness_about                 840 non-null float64
skewness_about.1               845 non-null float64
skewness_about.2               845 non-null float64
hollows_ratio                  846 non-null int64
class                          846 non-null object
dtypes: float64(14), int64(4), object(1)
memory usage: 125.7+ KB

Quick Insights

1.compactness,  max.length_aspect_ratio,max.length_rectangularity, hollows_ratio, class has no missing values rest all features are having some kind of missing values
2.All columns has numeric values

Identifying mising values

In [40]:
Index(['compactness', 'circularity', 'distance_circularity', 'radius_ratio',
       'pr.axis_aspect_ratio', 'max.length_aspect_ratio', 'scatter_ratio',
       'elongatedness', 'pr.axis_rectangularity', 'max.length_rectangularity',
       'scaled_variance', 'scaled_variance.1', 'scaled_radius_of_gyration',
       'scaled_radius_of_gyration.1', 'skewness_about', 'skewness_about.1',
       'skewness_about.2', 'hollows_ratio', 'class'],
      dtype='object')
Out[40]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio class
count 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000 846.000000
mean 93.678487 44.823877 82.100473 168.874704 61.677305 8.567376 168.887707 40.936170 20.580378 147.998818 188.596927 439.314421 174.706856 72.443262 6.361702 12.600473 188.918440 195.632388 0.977541
std 8.234474 6.134272 15.741569 33.401356 7.882188 4.601217 33.197710 7.811882 2.588558 14.515652 31.360427 176.496341 32.546277 7.468734 4.903244 8.930962 6.152247 7.438797 0.702130
min 73.000000 33.000000 40.000000 104.000000 47.000000 2.000000 112.000000 26.000000 17.000000 118.000000 130.000000 184.000000 109.000000 59.000000 0.000000 0.000000 176.000000 181.000000 0.000000
25% 87.000000 40.000000 70.000000 141.000000 57.000000 7.000000 147.000000 33.000000 19.000000 137.000000 167.000000 318.250000 149.000000 67.000000 2.000000 5.000000 184.000000 190.250000 0.000000
50% 93.000000 44.000000 80.000000 167.000000 61.000000 8.000000 157.000000 43.000000 20.000000 146.000000 179.000000 363.500000 173.500000 71.500000 6.000000 11.000000 188.000000 197.000000 1.000000
75% 100.000000 49.000000 98.000000 195.000000 65.000000 10.000000 198.000000 46.000000 23.000000 159.000000 217.000000 586.750000 198.000000 75.000000 9.000000 19.000000 193.000000 201.000000 1.000000
max 119.000000 59.000000 112.000000 333.000000 138.000000 55.000000 265.000000 61.000000 29.000000 188.000000 320.000000 1018.000000 268.000000 135.000000 22.000000 41.000000 206.000000 211.000000 2.000000
In [47]:
Original null values count
 compactness                    0
circularity                    5
distance_circularity           4
radius_ratio                   6
pr.axis_aspect_ratio           2
max.length_aspect_ratio        0
scatter_ratio                  1
elongatedness                  1
pr.axis_rectangularity         3
max.length_rectangularity      0
scaled_variance                3
scaled_variance.1              2
scaled_radius_of_gyration      2
scaled_radius_of_gyration.1    4
skewness_about                 6
skewness_about.1               1
skewness_about.2               1
hollows_ratio                  0
class                          0
dtype: int64


Count after we imputed the NaN value:
 compactness                    0
circularity                    0
distance_circularity           0
radius_ratio                   0
pr.axis_aspect_ratio           0
max.length_aspect_ratio        0
scatter_ratio                  0
elongatedness                  0
pr.axis_rectangularity         0
max.length_rectangularity      0
scaled_variance                0
scaled_variance.1              0
scaled_radius_of_gyration      0
scaled_radius_of_gyration.1    0
skewness_about                 0
skewness_about.1               0
skewness_about.2               0
hollows_ratio                  0
class                          0
dtype: int64
Observation:

we can see that the missing NaN values from our orginal vehdf datframe columns are treated and replaced using median strategy.

Descriptive statistical summary

describe() Function gives the mean, std and IQR values. It excludes character column and calculate summary statistics only for numeric columns.

In [49]:
Out[49]:
count mean std min 25% 50% 75% max
compactness 846.0 93.678487 8.234474 73.0 87.00 93.0 100.00 119.0
circularity 846.0 44.823877 6.134272 33.0 40.00 44.0 49.00 59.0
distance_circularity 846.0 82.100473 15.741569 40.0 70.00 80.0 98.00 112.0
radius_ratio 846.0 168.874704 33.401356 104.0 141.00 167.0 195.00 333.0
pr.axis_aspect_ratio 846.0 61.677305 7.882188 47.0 57.00 61.0 65.00 138.0
max.length_aspect_ratio 846.0 8.567376 4.601217 2.0 7.00 8.0 10.00 55.0
scatter_ratio 846.0 168.887707 33.197710 112.0 147.00 157.0 198.00 265.0
elongatedness 846.0 40.936170 7.811882 26.0 33.00 43.0 46.00 61.0
pr.axis_rectangularity 846.0 20.580378 2.588558 17.0 19.00 20.0 23.00 29.0
max.length_rectangularity 846.0 147.998818 14.515652 118.0 137.00 146.0 159.00 188.0
scaled_variance 846.0 188.596927 31.360427 130.0 167.00 179.0 217.00 320.0
scaled_variance.1 846.0 439.314421 176.496341 184.0 318.25 363.5 586.75 1018.0
scaled_radius_of_gyration 846.0 174.706856 32.546277 109.0 149.00 173.5 198.00 268.0
scaled_radius_of_gyration.1 846.0 72.443262 7.468734 59.0 67.00 71.5 75.00 135.0
skewness_about 846.0 6.361702 4.903244 0.0 2.00 6.0 9.00 22.0
skewness_about.1 846.0 12.600473 8.930962 0.0 5.00 11.0 19.00 41.0
skewness_about.2 846.0 188.918440 6.152247 176.0 184.00 188.0 193.00 206.0
hollows_ratio 846.0 195.632388 7.438797 181.0 190.25 197.0 201.00 211.0
class 846.0 0.977541 0.702130 0.0 0.00 1.0 1.00 2.0
Observation

Compactness, circularity has mean and median values almost similar , it signifies that it is normally distributed and has no skewness/outlier

In [64]:

Observation:

Most of the data attributes seems to be normally distributed scaled valriance 1 and skewness about 1 and 2, scatter_ratio, seems to be right skwed .

In [69]:
In [71]:
skewValue of dataframe attributes:
 compactness                    0.381271
circularity                    0.264928
distance_circularity           0.108718
radius_ratio                   0.397572
pr.axis_aspect_ratio           3.835392
max.length_aspect_ratio        6.778394
scatter_ratio                  0.608710
elongatedness                  0.046951
pr.axis_rectangularity         0.774406
max.length_rectangularity      0.256359
scaled_variance                0.655598
scaled_variance.1              0.845345
scaled_radius_of_gyration      0.279910
scaled_radius_of_gyration.1    2.089979
skewness_about                 0.780813
skewness_about.1               0.689014
skewness_about.2               0.249985
hollows_ratio                 -0.226341
class                          0.031106
dtype: float64
In [74]:
In [77]:
In [78]:
In [79]:

Observation on boxplots:

pr.axis_aspect_ratio, skewness_about, max_length_aspect_ratio, skewness_about_1, scaled_radius_of_gyration.1, scaled_variance.1, radius_ratio, skewness_about, scaled_variance.1 are some of the attributes with outliers

IQR

In [82]:
Out[82]:
(846, 19)
In [80]:
compactness                     13.00
circularity                      9.00
distance_circularity            28.00
radius_ratio                    54.00
pr.axis_aspect_ratio             8.00
max.length_aspect_ratio          3.00
scatter_ratio                   51.00
elongatedness                   13.00
pr.axis_rectangularity           4.00
max.length_rectangularity       22.00
scaled_variance                 50.00
scaled_variance.1              268.50
scaled_radius_of_gyration       49.00
scaled_radius_of_gyration.1      8.00
skewness_about                   7.00
skewness_about.1                14.00
skewness_about.2                 9.00
hollows_ratio                   10.75
class                            1.00
dtype: float64
In [83]:
Out[83]:
(813, 19)

Verification of outliers removal

In [84]:

Note

We can see that all out boxplot for all the attributes which had outlier have been treated and removed. Since no. of outliers were less we opted to remove it. Generally we avoid this as it can lead to info loss in case of large data sets with large no of outliers

Understanding the relationship between all independent attribute:

Data Correlation: Is a way to understand the relationship between multiple variables and attributes in your dataset. Using Correlation, you can get some insights such as:

One or multiple attributes depend on another attribute or a cause for another attribute.

One or multiple attributes are associated with other attributes.

Spearman and Pearson are two statistical methods to calculate the strength of correlation between two variables or attributes. Pearson Correlation Coefficient can be used with continuous variables that have a linear relationship.

In [87]:

Observation

Strong/fare Correlation
  - Scaled Variance & Scaled Variance.1 seems to be strongly correlated with value of 0.98
  - skewness_about_2 and hollow_ratio seems to be strongly correlated, corr coeff: 0.89
  - ditance_circularity and radius_ratio seems to have high positive correlation with corr coeff: 0.81
  - compactness & circularity , radius_ratio & pr.axis_aspect_ratio also seems ver averagely correlated with coeff: 0.67.
  - scaled _variance and scaled_radius_of_gyration, circularity & distance_circularity also seems to be highly correlated with corr coeff: 0.79
  - pr.axis_recatngularity and max.length_recatngularity also seems to be strongly correlated with coeff: 0.81 
  - scatter_ratio and elongatedness seems to be have strong negative correlation val : 0.97
  - elongatedness and pr.axis_rectangularity seems to have strong negative correlation, val:  0.95
Little/Poor Correlation
  -max_length_aspect_ratio & radius_ratio have average correlation with coeff: 0.5
  - pr.axis_aspect_ratio & max_length_aspect_ratio seems to have very little correlation
  - scaled_radius_gyration & scaled_radisu_gyration.1 seems to be very little correlated
  - scaled_radius_gyration.1 & skewness_about seems to be very little correlated
  - skewness_about & skewness_about.1 not be correlated
  - skewness_about.1 and skewness_about.2 are not correlated.

Pairplot Analysis:

In [88]:
Out[88]:
<seaborn.axisgrid.PairGrid at 0x17400844b00>
Quick insights:

As observed in our correlation heatmap our pairplot seems to validate the same. Scaled Variance & Scaled Variance.1 seems to be have very strong positive correlation with value of 0.98. skewness_about_2 and hollow_ratio also seems to have strong positive correation with coeff: 0.89

scatter_ratio and elongatedness seems to be have very strong negative correlation. elongatedness and pr.axis_rectangularity seems to have strong negative correlation

Choosing the right attributes which can be the right choice for model building

Type Markdown and LaTeX: α2

In [89]:

Principal Component Analysis(PCA):

Basically PCA is a dimension redcuction methodology which aims to reduce a large set of (often correlated) variables into a smaller set of (uncorrelated) variables, called principal components, which holds sufficient information without loosing the the relevant info much.

Separate The Data Into Independent & Dependent attribute

In [115]:
Out[115]:
array([[ 95.,  48.,  83., ...,  16., 187., 197.],
       [ 91.,  41.,  84., ...,  14., 189., 199.],
       [104.,  50., 106., ...,   9., 188., 196.],
       ...,
       [106.,  54., 101., ...,   4., 187., 201.],
       [ 86.,  36.,  78., ...,  25., 190., 195.],
       [ 85.,  36.,  66., ...,  18., 186., 190.]])
In [116]:

Type Markdown and LaTeX: α2

In [117]:
cov_matrix shape: (18, 18)
Covariance_matrix [[ 1.00118343  0.68569786  0.79086299  0.69055952  0.09164265  0.14842463
   0.81358214 -0.78968322  0.81465658  0.67694334  0.76297234  0.81497566
   0.58593517 -0.24988794  0.23635777  0.15720044  0.29889034  0.36598446]
 [ 0.68569786  1.00118343  0.79325751  0.6216467   0.15396023  0.25176438
   0.8489411  -0.82244387  0.84439802  0.96245572  0.79724837  0.83693508
   0.92691166  0.05200785  0.14436828 -0.01145212 -0.10455005  0.04640562]
 [ 0.79086299  0.79325751  1.00118343  0.76794246  0.15864319  0.26499957
   0.90614687 -0.9123854   0.89408198  0.77544391  0.86253904  0.88706577
   0.70660663 -0.22621115  0.1140589   0.26586088  0.14627113  0.33312625]
 [ 0.69055952  0.6216467   0.76794246  1.00118343  0.66423242  0.45058426
   0.73529816 -0.79041561  0.70922371  0.56962256  0.79435372  0.71928618
   0.53700678 -0.18061084  0.04877032  0.17394649  0.38266622  0.47186659]
 [ 0.09164265  0.15396023  0.15864319  0.66423242  1.00118343  0.64949139
   0.10385472 -0.18325156  0.07969786  0.1270594   0.27323306  0.08929427
   0.12211524  0.15313091 -0.05843967 -0.0320139   0.24016968  0.26804208]
 [ 0.14842463  0.25176438  0.26499957  0.45058426  0.64949139  1.00118343
   0.16638787 -0.18035326  0.16169312  0.30630475  0.31933428  0.1434227
   0.18996732  0.29608463  0.01561769  0.04347324 -0.02611148  0.14408905]
 [ 0.81358214  0.8489411   0.90614687  0.73529816  0.10385472  0.16638787
   1.00118343 -0.97275069  0.99092181  0.81004084  0.94978498  0.9941867
   0.80082111 -0.02757446  0.07454578  0.21267959  0.00563439  0.1189581 ]
 [-0.78968322 -0.82244387 -0.9123854  -0.79041561 -0.18325156 -0.18035326
  -0.97275069  1.00118343 -0.95011894 -0.77677186 -0.93748998 -0.95494487
  -0.76722075  0.10342428 -0.05266193 -0.18527244 -0.11526213 -0.2171615 ]
 [ 0.81465658  0.84439802  0.89408198  0.70922371  0.07969786  0.16169312
   0.99092181 -0.95011894  1.00118343  0.81189327  0.93533261  0.98938264
   0.79763248 -0.01551372  0.08386628  0.21495454 -0.01867064  0.09940372]
 [ 0.67694334  0.96245572  0.77544391  0.56962256  0.1270594   0.30630475
   0.81004084 -0.77677186  0.81189327  1.00118343  0.74586628  0.79555492
   0.86747579  0.04167099  0.13601231  0.00136727 -0.10407076  0.07686047]
 [ 0.76297234  0.79724837  0.86253904  0.79435372  0.27323306  0.31933428
   0.94978498 -0.93748998  0.93533261  0.74586628  1.00118343  0.94679667
   0.77983844  0.11321163  0.03677248  0.19446837  0.01423606  0.08579656]
 [ 0.81497566  0.83693508  0.88706577  0.71928618  0.08929427  0.1434227
   0.9941867  -0.95494487  0.98938264  0.79555492  0.94679667  1.00118343
   0.79595778 -0.01541878  0.07696823  0.20104818  0.00622636  0.10305714]
 [ 0.58593517  0.92691166  0.70660663  0.53700678  0.12211524  0.18996732
   0.80082111 -0.76722075  0.79763248  0.86747579  0.77983844  0.79595778
   1.00118343  0.19169941  0.16667971 -0.05621953 -0.22471583 -0.11814142]
 [-0.24988794  0.05200785 -0.22621115 -0.18061084  0.15313091  0.29608463
  -0.02757446  0.10342428 -0.01551372  0.04167099  0.11321163 -0.01541878
   0.19169941  1.00118343 -0.08846001 -0.12633227 -0.749751   -0.80307227]
 [ 0.23635777  0.14436828  0.1140589   0.04877032 -0.05843967  0.01561769
   0.07454578 -0.05266193  0.08386628  0.13601231  0.03677248  0.07696823
   0.16667971 -0.08846001  1.00118343 -0.03503155  0.1154338   0.09724079]
 [ 0.15720044 -0.01145212  0.26586088  0.17394649 -0.0320139   0.04347324
   0.21267959 -0.18527244  0.21495454  0.00136727  0.19446837  0.20104818
  -0.05621953 -0.12633227 -0.03503155  1.00118343  0.07740174  0.20523257]
 [ 0.29889034 -0.10455005  0.14627113  0.38266622  0.24016968 -0.02611148
   0.00563439 -0.11526213 -0.01867064 -0.10407076  0.01423606  0.00622636
  -0.22471583 -0.749751    0.1154338   0.07740174  1.00118343  0.89363767]
 [ 0.36598446  0.04640562  0.33312625  0.47186659  0.26804208  0.14408905
   0.1189581  -0.2171615   0.09940372  0.07686047  0.08579656  0.10305714
  -0.11814142 -0.80307227  0.09724079  0.20523257  0.89363767  1.00118343]]

Calculating Eigen Vectors & Eigen Values: Using numpy linear algebra function

In [118]:
Eigen Vectors 
%s [[ 2.75283688e-01  1.26953763e-01  1.19922479e-01 -7.83843562e-02
  -6.95178336e-02  1.44875476e-01  4.51862331e-01  5.66136785e-01
   4.84418105e-01  2.60076393e-01 -4.65342885e-02  1.20344026e-02
  -1.56136836e-01 -1.00728764e-02 -6.00532537e-03  6.00485194e-02
  -6.50956666e-02 -9.67780251e-03]
 [ 2.93258469e-01 -1.25576727e-01  2.48205467e-02 -1.87337408e-01
   8.50649539e-02 -3.02731148e-01 -2.49103387e-01  1.79851809e-01
   1.41569001e-02 -9.80779086e-02 -3.01323693e-03 -2.13635088e-01
  -1.50116709e-02 -9.15939674e-03  7.38059396e-02 -4.26993118e-01
  -2.61244802e-01 -5.97862837e-01]
 [ 3.04609128e-01  7.29516436e-02  5.60143254e-02  7.12008427e-02
  -4.06645651e-02 -1.38405773e-01  7.40350569e-02 -4.34748988e-01
   1.67572478e-01  2.05031597e-01 -7.06489498e-01  3.46330345e-04
   2.37111452e-01  6.94599696e-03 -2.50791236e-02  1.46240270e-01
   7.82651714e-02 -1.57257142e-01]
 [ 2.67606877e-01  1.89634378e-01 -2.75074211e-01  4.26053415e-02
   4.61473714e-02  2.48136636e-01 -1.76912814e-01 -1.01998360e-01
   2.30313563e-01  4.77888949e-02  1.07151583e-01 -1.57049977e-01
   3.07818692e-02 -4.20156482e-02 -3.59880417e-02 -5.21374718e-01
   5.60792139e-01  1.66551725e-01]
 [ 8.05039890e-02  1.22174860e-01 -6.42012966e-01 -3.27257119e-02
   4.05494487e-02  2.36932611e-01 -3.97876601e-01  6.87147927e-02
   2.77128307e-01 -1.08075009e-01 -3.85169721e-02  1.10106595e-01
   3.92804479e-02  3.12698087e-02  1.25847434e-02  3.63120360e-01
  -3.22276873e-01 -6.36138719e-02]
 [ 9.72756855e-02 -1.07482875e-02 -5.91801304e-01 -3.14147277e-02
  -2.13432566e-01 -4.19330747e-01  5.03413610e-01 -1.61153097e-01
  -1.48032250e-01  1.18266345e-01  2.62254132e-01 -1.32935328e-01
  -3.72884301e-02 -9.99915816e-03 -2.84168792e-02  6.27796802e-02
   4.87809642e-02 -8.63169844e-02]
 [ 3.17092750e-01 -4.81181371e-02  9.76283108e-02  9.57485748e-02
   1.54853055e-02  1.16100153e-01  6.49879382e-02 -1.00688056e-01
  -5.44574214e-02 -1.65167200e-01  1.70405800e-01  9.55883216e-02
  -3.94638419e-02  8.40975659e-01 -2.49652703e-01  6.40502241e-02
   1.81839668e-02 -7.98693109e-02]
 [-3.14133155e-01 -1.27498515e-02 -5.76484384e-02 -8.22901952e-02
  -7.68518712e-02 -1.41840112e-01  1.38112945e-02  2.15497166e-01
   1.56867362e-01  1.51612333e-01  5.76632611e-02  1.22012715e-01
   8.10394855e-01  2.38188639e-01 -4.21478467e-02 -1.86946145e-01
  -2.50330194e-02  4.21515054e-02]
 [ 3.13959064e-01 -5.99352482e-02  1.09512416e-01  9.24582989e-02
  -2.17633157e-03  9.80561329e-02  9.66573058e-02 -6.35933915e-02
  -5.24978759e-03 -1.93777917e-01  2.72514033e-01  2.51281206e-01
   2.71573184e-01 -1.01154594e-01  7.17396292e-01  1.80912790e-01
   1.64490784e-01 -1.44490635e-01]
 [ 2.82830900e-01 -1.16220532e-01  1.70641987e-02 -1.88005612e-01
   6.06366845e-02 -4.61674972e-01 -1.04552173e-01  2.49495867e-01
   6.10362445e-02 -4.69059999e-01 -1.41434233e-01 -1.24529334e-01
   7.57105808e-02 -1.69481636e-02 -4.70233017e-02  1.74070296e-01
   1.47280090e-01  5.11259153e-01]
 [ 3.09280359e-01 -6.22806229e-02 -5.63239801e-02  1.19844008e-01
   4.56472367e-04  2.36225434e-01  1.14622578e-01 -5.02096319e-02
  -2.97588112e-01  1.29986011e-01 -7.72596638e-02 -2.15011644e-01
   1.53180808e-01  6.04665108e-03  1.71503771e-01 -2.77272123e-01
  -5.64444637e-01  4.53236855e-01]
 [ 3.13788457e-01 -5.37843596e-02  1.08840729e-01  9.17449325e-02
   1.95548315e-02  1.57820194e-01  8.37350220e-02 -4.37649907e-02
  -8.33669838e-02 -1.58203940e-01  2.43226301e-01  1.75685051e-01
   3.07948154e-01 -4.69202757e-01 -6.16589383e-01  7.85141734e-02
  -6.85856929e-02 -1.26992250e-01]
 [ 2.72047492e-01 -2.09233172e-01  3.14636493e-02 -2.00095228e-01
   6.15991681e-02 -1.35576278e-01 -3.73944382e-01  1.08474496e-01
  -2.41655483e-01  6.86493700e-01  1.58888394e-01  1.90336498e-01
  -3.76087492e-02  1.17483082e-02 -2.64910290e-02  2.00683948e-01
   1.47099233e-01  1.09982525e-01]
 [-2.08137692e-02 -4.88525148e-01 -2.86277015e-01  6.55051354e-02
  -1.45530146e-01  2.41356821e-01  1.11952983e-01  3.40878491e-01
  -3.20221887e-01 -1.27648385e-01 -4.19188664e-01  2.85710601e-01
  -4.34650674e-02  3.14812146e-03 -1.42959461e-02 -1.46861607e-01
   2.32941262e-01 -1.11271959e-01]
 [ 4.14555082e-02  5.50899716e-02  1.15679354e-01 -6.04794251e-01
  -7.29189842e-01  2.03209257e-01 -8.06328902e-02 -1.56487670e-01
  -2.21054148e-02 -9.83643219e-02  1.25447648e-02 -1.60327156e-03
  -9.94304634e-03 -3.03156233e-03  1.74310271e-03 -1.73360301e-02
  -2.77589170e-02  2.40943096e-02]
 [ 5.82250207e-02  1.24085090e-01  7.52828901e-02  6.66114117e-01
  -5.99196401e-01 -1.91960802e-01 -2.84558723e-01  2.08774083e-01
  -1.01761758e-02  3.55150608e-02  3.27808069e-02 -8.32589542e-02
  -2.68915150e-02 -1.25315953e-02 -7.08894692e-03  3.13689218e-02
   2.78187408e-03 -9.89651885e-03]
 [ 3.02795063e-02  5.40914775e-01 -8.73592034e-03 -1.05526253e-01
   1.00602332e-01  1.56939174e-01  1.81451818e-02  3.04580219e-01
  -5.17222779e-01 -1.93956186e-02 -1.20597635e-01 -3.53723696e-01
   1.86595152e-01  4.34282436e-02  7.67874680e-03  2.31451048e-01
   1.90629960e-01 -1.82212045e-01]
 [ 7.41453913e-02  5.40354258e-01 -3.95242743e-02 -4.74890311e-02
   2.98614819e-02 -2.41222817e-01  1.57237839e-02  3.04186304e-02
  -1.71506343e-01 -6.41314778e-02 -9.19597847e-02  6.85618161e-01
  -1.42380007e-01 -6.47700819e-03  6.37681817e-03 -2.88502234e-01
  -1.20966490e-01  9.04014702e-02]]

 Eigen Values 
%s [9.40460261e+00 3.01492206e+00 1.90352502e+00 1.17993747e+00
 9.17260633e-01 5.39992629e-01 3.58870118e-01 2.21932456e-01
 1.60608597e-01 9.18572234e-02 6.64994118e-02 4.66005994e-02
 3.57947189e-02 2.96445743e-03 1.00257898e-02 2.74120657e-02
 1.79166314e-02 2.05792871e-02]

Sort eigenvalues in descending order

In [119]:
[(9.404602609088705, array([ 0.27528369,  0.29325847,  0.30460913,  0.26760688,  0.08050399,
        0.09727569,  0.31709275, -0.31413315,  0.31395906,  0.2828309 ,
        0.30928036,  0.31378846,  0.27204749, -0.02081377,  0.04145551,
        0.05822502,  0.03027951,  0.07414539])), (3.014922058524633, array([ 0.12695376, -0.12557673,  0.07295164,  0.18963438,  0.12217486,
       -0.01074829, -0.04811814, -0.01274985, -0.05993525, -0.11622053,
       -0.06228062, -0.05378436, -0.20923317, -0.48852515,  0.05508997,
        0.12408509,  0.54091477,  0.54035426])), (1.9035250218389657, array([ 0.11992248,  0.02482055,  0.05601433, -0.27507421, -0.64201297,
       -0.5918013 ,  0.09762831, -0.05764844,  0.10951242,  0.0170642 ,
       -0.05632398,  0.10884073,  0.03146365, -0.28627701,  0.11567935,
        0.07528289, -0.00873592, -0.03952427])), (1.1799374684450215, array([-0.07838436, -0.18733741,  0.07120084,  0.04260534, -0.03272571,
       -0.03141473,  0.09574857, -0.0822902 ,  0.0924583 , -0.18800561,
        0.11984401,  0.09174493, -0.20009523,  0.06550514, -0.60479425,
        0.66611412, -0.10552625, -0.04748903])), (0.9172606328594372, array([-6.95178336e-02,  8.50649539e-02, -4.06645651e-02,  4.61473714e-02,
        4.05494487e-02, -2.13432566e-01,  1.54853055e-02, -7.68518712e-02,
       -2.17633157e-03,  6.06366845e-02,  4.56472367e-04,  1.95548315e-02,
        6.15991681e-02, -1.45530146e-01, -7.29189842e-01, -5.99196401e-01,
        1.00602332e-01,  2.98614819e-02])), (0.5399926288001127, array([ 0.14487548, -0.30273115, -0.13840577,  0.24813664,  0.23693261,
       -0.41933075,  0.11610015, -0.14184011,  0.09805613, -0.46167497,
        0.23622543,  0.15782019, -0.13557628,  0.24135682,  0.20320926,
       -0.1919608 ,  0.15693917, -0.24122282])), (0.3588701179293984, array([ 0.45186233, -0.24910339,  0.07403506, -0.17691281, -0.3978766 ,
        0.50341361,  0.06498794,  0.01381129,  0.09665731, -0.10455217,
        0.11462258,  0.08373502, -0.37394438,  0.11195298, -0.08063289,
       -0.28455872,  0.01814518,  0.01572378])), (0.2219324559989345, array([ 0.56613679,  0.17985181, -0.43474899, -0.10199836,  0.06871479,
       -0.1611531 , -0.10068806,  0.21549717, -0.06359339,  0.24949587,
       -0.05020963, -0.04376499,  0.1084745 ,  0.34087849, -0.15648767,
        0.20877408,  0.30458022,  0.03041863])), (0.16060859663511767, array([ 0.4844181 ,  0.0141569 ,  0.16757248,  0.23031356,  0.27712831,
       -0.14803225, -0.05445742,  0.15686736, -0.00524979,  0.06103624,
       -0.29758811, -0.08336698, -0.24165548, -0.32022189, -0.02210541,
       -0.01017618, -0.51722278, -0.17150634])), (0.09185722339516111, array([ 0.26007639, -0.09807791,  0.2050316 ,  0.04778889, -0.10807501,
        0.11826635, -0.1651672 ,  0.15161233, -0.19377792, -0.46906   ,
        0.12998601, -0.15820394,  0.6864937 , -0.12764838, -0.09836432,
        0.03551506, -0.01939562, -0.06413148])), (0.06649941176460208, array([-0.04653429, -0.00301324, -0.7064895 ,  0.10715158, -0.03851697,
        0.26225413,  0.1704058 ,  0.05766326,  0.27251403, -0.14143423,
       -0.07725966,  0.2432263 ,  0.15888839, -0.41918866,  0.01254476,
        0.03278081, -0.12059763, -0.09195978])), (0.04660059944187704, array([ 1.20344026e-02, -2.13635088e-01,  3.46330345e-04, -1.57049977e-01,
        1.10106595e-01, -1.32935328e-01,  9.55883216e-02,  1.22012715e-01,
        2.51281206e-01, -1.24529334e-01, -2.15011644e-01,  1.75685051e-01,
        1.90336498e-01,  2.85710601e-01, -1.60327156e-03, -8.32589542e-02,
       -3.53723696e-01,  6.85618161e-01])), (0.03579471891303873, array([-0.15613684, -0.01501167,  0.23711145,  0.03078187,  0.03928045,
       -0.03728843, -0.03946384,  0.81039486,  0.27157318,  0.07571058,
        0.15318081,  0.30794815, -0.03760875, -0.04346507, -0.00994305,
       -0.02689151,  0.18659515, -0.14238001])), (0.027412065737195113, array([ 0.06004852, -0.42699312,  0.14624027, -0.52137472,  0.36312036,
        0.06277968,  0.06405022, -0.18694615,  0.18091279,  0.1740703 ,
       -0.27727212,  0.07851417,  0.20068395, -0.14686161, -0.01733603,
        0.03136892,  0.23145105, -0.28850223])), (0.020579287070888724, array([-0.0096778 , -0.59786284, -0.15725714,  0.16655173, -0.06361387,
       -0.08631698, -0.07986931,  0.04215151, -0.14449063,  0.51125915,
        0.45323685, -0.12699225,  0.10998252, -0.11127196,  0.02409431,
       -0.00989652, -0.18221204,  0.09040147])), (0.01791663143223643, array([-0.06509567, -0.2612448 ,  0.07826517,  0.56079214, -0.32227687,
        0.04878096,  0.01818397, -0.02503302,  0.16449078,  0.14728009,
       -0.56444464, -0.06858569,  0.14709923,  0.23294126, -0.02775892,
        0.00278187,  0.19062996, -0.12096649])), (0.010025789847555906, array([-0.00600533,  0.07380594, -0.02507912, -0.03598804,  0.01258474,
       -0.02841688, -0.2496527 , -0.04214785,  0.71739629, -0.0470233 ,
        0.17150377, -0.61658938, -0.02649103, -0.01429595,  0.0017431 ,
       -0.00708895,  0.00767875,  0.00637682])), (0.002964457425044782, array([-0.01007288, -0.0091594 ,  0.006946  , -0.04201565,  0.03126981,
       -0.00999916,  0.84097566,  0.23818864, -0.10115459, -0.01694816,
        0.00604665, -0.46920276,  0.01174831,  0.00314812, -0.00303156,
       -0.0125316 ,  0.04342824, -0.00647701]))]
In [120]:
Eigenvalues in descending order: 
[9.404602609088705, 3.014922058524633, 1.9035250218389657, 1.1799374684450215, 0.9172606328594372, 0.5399926288001127, 0.3588701179293984, 0.2219324559989345, 0.16060859663511767, 0.09185722339516111, 0.06649941176460208, 0.04660059944187704, 0.03579471891303873, 0.027412065737195113, 0.020579287070888724, 0.01791663143223643, 0.010025789847555906, 0.002964457425044782]
In [121]:

Plotting The Explained Variance and Princiapl Components

In [122]:

Observation

  • From above we plot we can clealry observer that 8 dimension() are able to explain 95 %variance of data.
  • so we will use first 8 principal components going forward and calulate the reduced dimensions.

Dimensionality Reduction

In [124]:
Out[124]:
0 1 2 3 4 5 6 7
0 0.334162 0.219026 -1.001584 -0.176612 -0.079301 -0.757447 -0.901124 0.381106
1 -1.591711 0.420603 0.369034 -0.233234 -0.693949 -0.517162 0.378637 -0.247059
2 3.769324 -0.195283 -0.087859 -1.202212 -0.731732 0.705041 -0.034584 -0.482772
3 -1.738598 2.829692 -0.109456 -0.376685 0.362897 -0.484431 0.470753 0.023086
4 0.558103 -4.758422 -11.703647 -0.147464 -3.256953 -0.203446 2.671578 0.448854
5 5.788913 -3.680602 2.010549 0.771052 0.393432 1.571298 1.266585 -0.172753
6 -0.773309 2.209779 0.124292 -2.114305 0.192185 0.791615 -0.147108 0.399470
7 -2.141137 1.176398 -0.655974 -0.825056 1.242390 -0.643791 -0.137203 0.417908
8 -4.458273 3.097744 -0.100556 0.551469 0.568003 -0.278121 0.291617 0.383483
9 0.937564 1.827625 0.076417 0.641740 0.404678 -0.343602 0.661844 -0.890702
10 -3.496120 1.765730 -0.288863 0.423805 0.683214 -0.295118 0.500235 -0.278815
11 -4.385098 2.429508 0.780084 1.309068 -0.852738 -0.052094 0.327329 0.401629
12 -0.824101 -0.023898 -0.451654 -0.299152 0.572621 -0.067019 -1.320285 0.188874
13 -1.410988 0.017099 0.118172 -0.144426 -0.504892 -0.635107 0.294670 -0.535174
14 1.204698 0.867959 -0.541575 -1.192222 1.391408 0.812077 -1.200664 0.499041
15 3.806035 -1.299234 -0.179771 -1.033485 1.123398 -0.142831 -0.701051 -0.492891
16 -5.285376 -1.737289 0.543063 0.856879 -0.159492 -0.090254 0.778860 0.882477
17 0.351813 1.596724 -0.497066 0.410143 0.782447 1.718050 0.203570 0.482944
18 4.168899 -1.162952 0.401351 -0.121471 0.495700 -0.518158 0.214084 0.216815
19 4.138882 -1.185603 0.051057 -0.703379 0.822731 -0.140849 -0.040904 -0.151851
20 -1.377868 -1.183899 -0.338608 -0.073117 1.671541 -0.257825 -0.797651 -0.197262
21 -5.114025 -2.259206 -0.079577 -0.585322 0.061602 0.902472 0.263385 -0.030265
22 -1.156183 2.060654 -0.596965 -1.706903 0.754918 0.569826 -0.437699 0.670980
23 -2.758101 0.859532 -0.148215 -0.080619 1.499065 0.045727 0.360061 -0.215406
24 3.868283 0.401624 -0.558639 -0.006056 1.807500 -0.578300 -0.224125 -0.310045
25 -1.446173 0.022235 -0.613919 -0.255401 1.391048 -0.898199 -0.352514 -0.335719
26 -4.836902 -2.352574 0.033921 -0.218285 0.043112 0.771770 0.417622 -0.206066
27 4.608520 -0.143999 -0.019001 1.688463 -0.016430 -0.675543 -0.155168 0.819769
28 1.548311 0.287847 0.090401 0.013082 0.610593 1.778963 0.586569 0.392052
29 -3.214755 -2.209652 0.319258 0.796331 -1.410559 0.461943 -0.264400 -0.462280
... ... ... ... ... ... ... ... ...
816 0.016274 -0.785303 -0.581386 0.196813 0.195267 -1.627901 -0.528587 0.289916
817 4.825136 0.616099 0.391465 0.289641 -2.036613 -0.515299 -0.353710 0.035988
818 1.551551 0.802066 -0.351121 1.651584 -0.897354 0.544654 -0.076262 -0.571300
819 1.074145 0.683495 0.065956 0.723625 -1.088896 0.726594 -0.087052 -1.107451
820 -3.643578 -1.531739 0.416411 0.923812 0.624368 -0.166112 0.850997 -0.452610
821 4.942559 -0.544091 -0.982453 0.109177 0.173764 0.099021 -0.780395 0.324079
822 -1.400054 2.729734 -0.420823 -0.372582 0.817922 -0.402141 0.384361 0.275930
823 3.882443 0.564842 -0.033649 -0.226736 -0.684327 -0.262839 -0.598303 0.130122
824 4.600146 -0.769824 0.421024 -0.457703 -0.247827 -0.457680 0.455470 0.171184
825 -1.690728 -0.838575 0.249256 2.137680 -0.078512 -0.159746 0.087165 -0.691208
826 1.278951 1.925134 -0.211006 -0.682425 1.904657 -0.434803 0.443712 -0.313621
827 4.589240 -1.302644 0.715027 -0.449041 0.926804 -1.766182 0.702958 -0.167075
828 -2.926212 -0.499590 0.882516 -0.397451 -1.048134 -0.643612 -0.086140 0.534861
829 -0.278020 -1.571026 0.300845 -0.687990 -0.193341 -1.424553 0.345360 0.266379
830 -1.942620 3.315289 -0.359661 1.165810 0.883361 0.614918 0.541667 0.700549
831 0.224383 2.137172 -0.478108 -1.424917 0.520468 0.430877 -0.415455 1.167027
832 4.218297 0.342006 0.487559 -1.240183 -1.204858 0.583314 0.645731 -0.464186
833 -0.507041 -0.785039 0.933987 -1.090790 0.035078 0.029312 -0.094998 -0.449322
834 -4.608957 -2.579611 0.757905 -1.176938 -1.281133 1.475938 0.225546 -0.216678
835 7.053452 -3.905816 2.015360 0.465308 0.807289 1.094625 1.023809 0.502322
836 -2.143056 -0.368251 0.067162 -0.040890 0.495869 -1.004984 -0.376749 0.533011
837 -0.370040 0.831673 0.343085 -1.843030 0.093118 0.053087 -0.022243 -0.131876
838 -1.255256 0.351723 0.238000 1.129099 -0.257318 -1.297831 0.355648 0.628974
839 -1.927389 1.695766 -0.192125 -1.586357 0.419239 -0.464435 -0.434285 0.194171
840 -3.726742 3.520109 0.782154 1.466184 -0.919050 -0.110103 0.515636 0.651691
841 -0.442648 0.605884 -0.197213 1.444958 -1.065425 0.820179 -0.041563 -0.506991
842 -0.314956 -0.164511 -0.794573 0.908272 0.235492 -1.438257 -0.599113 0.153086
843 4.809174 0.001249 -0.532333 -0.295652 1.344236 -0.217070 0.573249 -0.110478
844 -3.294092 1.008276 0.357003 1.933675 -0.042768 -0.402491 -0.202406 -0.320622
845 -4.765053 -0.334900 0.568136 1.224807 0.054051 -0.335637 0.058098 0.248035

846 rows × 8 columns

In [125]:
Out[125]:
<seaborn.axisgrid.PairGrid at 0x1740e2980f0>
After dimensionality reduction using PCA our attributes have become independent with no correlation among themselves. As most of them have cloud of data points with no lienaer kind of relationship.

Fitting Model and measuring score simply on Original Data

In [127]:

Fitting SVC model On Original Data

In [128]:
In [129]:
Out[129]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
In [130]:

Fitting SVC ON PCA Data:

In [131]:
In [132]:
Model Score On Original Data  0.952755905511811
Model Score On Reduced PCA Dimension  0.9330708661417323
Before PCA On Original 18 Dimension 0.952755905511811
After PCA(On 8 dimension) 0.9330708661417323
In [ ]:

Confusion Matrix:

In [133]:
Confusion Matrix For : 
 Original Data Set [[ 58   0   1]
 [  1 129   3]
 [  6   1  55]]
Confusion Matrix For : 
 For Reduced Dimensions Using PCA  [[ 57   2   0]
 [  2 126   5]
 [  1   7  54]]
Classification Report For Raw Data: 
               precision    recall  f1-score   support

         0.0       0.89      0.98      0.94        59
         1.0       0.99      0.97      0.98       133
         2.0       0.93      0.89      0.91        62

   micro avg       0.95      0.95      0.95       254
   macro avg       0.94      0.95      0.94       254
weighted avg       0.95      0.95      0.95       254

Classification Report For PCA: 
               precision    recall  f1-score   support

         0.0       0.95      0.97      0.96        59
         1.0       0.93      0.95      0.94       133
         2.0       0.92      0.87      0.89        62

   micro avg       0.93      0.93      0.93       254
   macro avg       0.93      0.93      0.93       254
weighted avg       0.93      0.93      0.93       254

Confusion Metric Analysis ON Original Data:

Confusion Matrix For : Original Data Set

  • Our model on original data set has correctly classified 58 van out of 59 actuals vans and has errored only in one case where it has wrongly predicted van to be a bus.
    • IN case of 133 actual cars our svm model has correcly classified 129 cars. it has wrongly classified 3 cars to be a bus and also 1 car to be a van
    • In case of 62 instances of actual bus , our model has correctly classified 55 buses , It has faltered in classifying wrongly 6 buses to be a van and one bus to be a car.

Confusion Metric Analysis ON Reduced Dimesnion After PCA

For Reduced Dimensions Using PCA:

  • Out of 59 actual instances of vans our model has correctly predicted 57 vans and errored in 2 instances where it wrongly classified vans to be a car.
  • Out of 133 actuals cars , our mdoel has correclty classified 126 of them to be a car and faltered in 7 cases where it wrongly classified 5 cars to a bus and 2 cars to be a van.

  • Out of 62 actual bus , our model has correclty classified 54 of them to be a bus. It has faltered in 8 cases where it wrongly classified 7 bus to be a car and 1 bus to be a van.

Let's Apply Grid Search & Cross-Validation:To Tune Our Model and Validate The Model's Accuracy Score

In [134]:
In [135]:
SVM Parameters: {'C': 1.0, 'cache_size': 200, 'class_weight': None, 'coef0': 0.0, 'decision_function_shape': 'ovr', 'degree': 3, 'gamma': 'auto_deprecated', 'kernel': 'rbf', 'max_iter': -1, 'probability': False, 'random_state': None, 'shrinking': True, 'tol': 0.001, 'verbose': False}

Hypertuning SVM using hyper Parameters:

Iteration 1: In Case Of PCA:

In [136]:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-136-94130ff7b0c4> in <module>
----> 1 classifiers_hypertune("Support Vector Classifier", svmc, param_grid,X_train_std_pca, SplitScale_y_train, X_test_std_pca, SplitScale_y_test,10)

NameError: name 'X_train_std_pca' is not defined

In [ ]:
In [ ]:
In [ ]: